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Abstract 

Social networks are organized into communities with dense internal connections, 
giving rise to high values of the clustering coefficient. In addition, these networks 
have been observed to be assortative, i.e. highly connected vertices tend to connect 
to other highly connected vertices, and have broad degree distributions. We present 
a model for an undirected growing network which reproduces these characteristics, 
with the aim of producing efficiently very large networks to be used as platforms 
for studying sociodynamic phenomena. The communities arise from a mixture of 
random attachment and implicit preferential attachment. The structural properties 
of the model are studied analytically and numerically, using the /c-clique method for 
quantifying the communities. 
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Introduction 



The recent substantial interest in the structural and functional properties of 
complex networks (for reviews, see [l|, Q, 0|) has been partially stimulated 
by attempts to understand the characteristics of social networks, such as the 
small- world property and high degree of clustering (if. Before this, social net- 
works have been intensively studied by social scientists 0, 0, 0] for several 
decades in order to understand both local phenomena, such as clique formation 
and their dynamics, as well as network-wide processes, such as transmission 
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of information. Within the framework of complex networks, studies have con- 
centrated on the structural analysis of various types of social networks, such 
as those related to sexual contacts professional collaboration 0, 0, [13] 
and Internet dating as well as models of collective behaviour and various 
sociodynamic phenomena 0, One feature of particular interest has 

been to evaluate and detect community structure in networks [lii llfil. fl7i . 
where the developed methodologies have found applications in various other 
fields such as systems biology Communities can, roughly speaking, be 
defined as sets of vertices with dense internal connections, such that the inter- 
community connections are relatively sparse. In everyday social life or pro- 
fessional collaborations, people tend to form communities, the existence of 
which is a prominent characteristic of social networks and has far reaching 
consequences on the processes taking place on them, such as propagation of 
information and opinion formation. 

It is evident that theoretical studies of processes and collective behaviour 
taking place on social networks would benefit from realistic social network 
models. Essential characteristics for social networks are believed to include 
assortative mixing 2jl 2lL high clustering, short average path lengths, broad 
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degree distributions |22j, [2J, [2J|, and the existence of community structure. 
Here, we propose a new model that exhibits all the above characteristics. 
So far, different approaches have been taken to define social network mod- 
els To our knowledge, of the above @] exhibits 
community structure, high clustering and assortativity 1 , but based on visu- 
alizations given in the paper their community structure appears very different 
from the proposed model. Our model belongs to the class of growing network 
models, i.e. all edges are generated in connection with new vertices joining 
the network. Network growth is governed by two processes: 1) attachment to 
random vertices, and 2) attachment to the neighbourhood of the random ver- 
tices ("getting to know friends of friends"), giving rise to implicit preferential 
attachment. These processes then, under certain conditions, give rise to broad 
degree distributions, high clustering coefficients, strong positive degree-degree 
correlations and community structure. 

This paper is structured as follows: First, we motivate the model based on real- 
world observations, followed by description of the network growth algorithm. 
Next, we derive approximate expressions for the degree distribution and clus- 
tering spectrum and compare our theoretical results to simulations. We also 
present numerical results for the degree-degree correlations. We then address 
the issue of community structure using the /c-clique method ^j]. Finally, we 
conclude with a brief summary of our results. 



1 The model presented in |23] also exhibits community structure and high clustering, 
but weak assortativity, with assortative mixing coefficients of the order 0.01. 
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2 Model 



2.1 Motivation for the model 

Our basic aim has been to develop a model which a) captures the salient 
features of real-world social networks, and b) is as simple as possible, and 
simple enough to allow approximate analytical derivations of the fundamental 
characteristics, although one of the desired structural characteristics (posi- 
tive degree-degree correlations) makes exact derivations rather difficult. The 
resulting network is of interest rather than the growth mechanism. 

To satisfy the first criterion, we have set the following requirements for the 
main characteristics of networks generated by our model: i) Due to limited 
social resources, the degree distribution p(k) should have a steep tail j^, ii) 
Average path lengths should grow slowly with network size, iii) The networks 
should exhibit high average clustering, iv) The networks should display pos- 
itive degree-degree correlations, i.e. be assortative, v) The networks should 
contain communities with dense internal connections. 

Requirement i) is based on the observation that many social interaction net- 
works display power-law-like degree distributions but may display a cutoff 
at large degrees 0, [13]. In some cases, degree exponents beyond the com- 
monly expected range 2 < 7 < 3 have been observed, e.g., in the PGP web 
of trust [23] a power-law like tail with exponent 7 = 4 has been observed. 
Similar findings have also been made in a study based on a very large mobile 
phone call dataset j^. In light of these data, we will be satisfied with a model 
that produces either steep power laws or a cutoff at high degrees. In the case 
of everyday social networks, common sense tells us that even in very large 
networks, no person can have tens of thousands of acquaintances. Hence, if 
the degree distribution is to be asymptotically scale-free p(k) oc /c -7 , the value 
of the exponent 7 should be above the commonly observed range of 2 < 7 < 3 
such that in networks of realistic sizes, iV > 10 6 vertices, the maximum degree 
is limited 2 , k 

max ~ 10 2 . As detailed later, such power-law distributions can be 
attributed to growth processes mixing random and preferential attachment. 

Requirement ii), short average path lengths, is a common characteristic ob- 
served in natural networks, including social networks. Requirements iii) high 
clustering, iv) assortativity, and v) existence of communities are also based 
on existing observations, and can be attributed to "local" edge formation, i.e. 
edges formed between vertices within short distances. The degree of clustering 
is typically measured using the average clustering coefficient (c), defined as the 
network average of c(k) = 2E/k (k — 1), where E is the number of triangles 

2 For networks with a scale-free tail of the degree distribution, k max ~ iV 1 ^ 7 " 1 ). 
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around a vertex of degree k and the factor \k (k — 1) gives the maximum 
number of such triangles. A commonly utilized measure of degree-degree cor- 
relations is the average nearest-neighbour degree spectrum k nn (k) - if k nn (k) 
has a positive slope, high-degree vertices tend to be connected to other high- 
degree vertices, i.e. the vertex degrees in the network are assortatively mixed 
(see, e.g., Ref. (221). For detecting and characterizing communities, several 
methods have been proposed 0, EH E3, (3 E3| • In social networks, each in- 
dividual can be assigned to several communities, and thus we have chosen to 
investigate the community structure of our model networks using a method 
which allows membership in several communities (isl |. 

To satisfy the second criterion, we have chosen a growing network model, since 
this allows using the rate equation approach |33L l34|. and because even very 
large networks can be produced using a simple and quick algorithm. It has been 
convincingly argued |2f| that since the number of vertices in a social network 
changes at a very slow rate compared to edges, a realistic social network 
model should feature a fixed number of vertices with a varying number and 
configuration of edges. However, as our focus is to merely provide a model 
generating substrate networks for future studies of sociodynamic phenomena, 
the time scales of which can be viewed to be much shorter than the time scales 
of changes in the network structure, a model where the networks are grown 
to desired size and then considered static is suitable for our purposes. 



2.2 Model algorithm 



The algorithm consists of two growth processes: 1) random attachment, and 
2) implicit preferential attachment resulting from following edges from the ran- 
domly chosen initial contacts. The local nature of the second process gives rise 
to high clustering, assortativity and community structure. As will be shown 
below, the degree distribution is determined by the number of edges gener- 
ated by the second process for each random attachment. The algorithm of the 
model reads as follows 3 : 

(1) Start with a seed network of N vertices. 

(2) Pick on average m r > 1 random vertices as initial contacts. 

(3) Pick on average m s > neighbours of each initial contact as secondary 
contacts. 

3 Our network growth mechanism bears some similarity to the Holme-Kim model, 
designed to produce scale-free networks with high clustering [3^]. In the HK model, 
the networks are grown with two processes: preferential attachment and triangle 
formation by connections to the neighbourhood. However, the structural properties 
of networks generated by our model differ considerably from HK model networks 
(e.g. in terms of assortativity and community structure). 
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(4) Connect the new vertex to the initial and secondary contacts. 

(5) Repeat steps 2 to 4 until the network has grown to desired size. 




Fig. 1. Growth process of the network. The new vertex v links to one or more ran- 
domly chosen initial contacts (here i,j) and possibly to some of their neighbours 
(here k, I). Roughly speaking, the neighbourhood connections contribute to the for- 
mation of communities, while the new vertex acts as a bridge between communities 
if more than one initial contact was chosen. 



Fig. 2. A visualization of a small network with ./V = 500 indicates strong community 
structure with communities of various sizes clearly visible. The number of initial 
contacts is distributed as p(rii n it = l) = 0.95, p(rii n it = 2) = 0.05, and the number of 
secondary contacts from each initial contact U2 n d ~ ^[0,3] (uniformly distributed 
between and 3). The network was grown from a chain of 30 vertices. Visualization 
was done using Himmeli (3fi| . 

The analytical calculations detailed in the next section use the expectation 
values for m r and m s . For the implementation, any non-negative distributions 
of m r and m s can be chosen with these expectation values. If the distribution 
for the number of secondary contacts has a long tail, it will often happen 
that the number of attempted secondary contacts is higher than the degree 
of the initial contact so that all attempted contacts cannot take place, which 
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will bias the degree distribution towards smaller degrees. We call this the 
saturation effect, since it is caused by all the neighbours of an initial contact 
being used up, or saturated. However, for the distributions of m s used in this 
paper the saturation effect does not seem to have much effect on the degree 
distribution. 

For appreciable community structure to form, it is essential that the number 
of links made to the neighbors of an initial contact varies, instead of always 
linking to one or all of the neighbors, and that sometimes more than one initial 
contact are chosen, to form 'bridges between communities'. Here, we use the 
discrete uniform distributions ri2nd ~ U[0, k], k = 1, 2, 3 for the number of sec- 
ondary contacts U2 n d, while for the number of initial contacts n init we usually 
fix the probabilities to be p± = 0.95 for picking one contact and p 2 = 0.05 for 
picking two. This results in sparse connectivity between the communities. The 
uniform distributions for rimd were chosen for simplicity, but allowing larger 
n 2nd would allow for larger cliques and stronger communities to form. 

2.3 Vertex degree distribution 

We will use the standard mean-field rate equation method [H] to derive an 
approximative expression for the vertex degree distribution. For growing net- 
work models mixing random and preferential attachment, power law degree 
distributions vik) ~ k 1 with exponents 2 < 7 < 00 have been derived in 
e.g. 0, 0- EE3| 4 • Since in our model the newly added links always emanate 
from the new vertex, the lower bound for the degree exponent is 3; by con- 
trast, if links are allowed to form between existing vertices in the network, the 
exponent can also have values between 2 and 3 (see, e.g., 

If no degree correlations were present, choosing a vertex on the other end of 
a randomly selected edge would correspond to linear preferential selection. 
In this model network correlations are present, leading to a bias from pure 
preferential attachment. Qualitatively, this can be explained as follows: A 
low degree vertex will have on the average low degree neighbors. Therefore, 
starting from a low degree vertex, which are the most numerous in the network, 
and proceeding to the neighbourhood, we are more likely to reach low degree 
vertices than their proportion in the network would imply. Hence, the hubs 
gain fewer links than they would with pure preferential attachment. Due to 
degree-degree correlations, then, the simulated curves will not closely match 
the theory, but at high values of k the theoretical distributions can be viewed 
as an upper limit to the average maximum degrees. 

4 The same result is found for generalized linear preferential attachment kernels 
7Tfc oc k+ko, where ko is a constant, since mixing random and preferential attachment 
can be recast as preferential attachment with a shifted kernel. 
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We first construct the rate equation which describes how the degree of a vertex 
changes on average during one time step of the network growth process. The 
degree of a vertex Vi grows via two processes: 1) a new vertex directly links 
to Vi (the probability of this happening is m r /t, since there are altogether ~ t 
vertices at time t, and m r random initial contacts are picked) 2) vertex Vi is 
selected as a secondary contact. In the following derivations we assume that 
the probability of 2) is linear with respect to vertex degree, i.e. following a 
random edge from a randomly selected vertex gives rise to implicit preferential 
attachment. Note that in this approximation we neglect the effects of correla- 
tions between the degrees of neighbouring vertices. On average m s neighbours 
of the m r initial contacts are selected to be secondary contacts. These two 
processes lead to the following rate equation for the degree of vertex vc 

dki (\ hi \ 1 / m s , \ . 

^=H-t +ms Ek) = -t r r + 2(TT^v ' (1) 

where we substituted 2m r (l+m s ) t for k, based on the facts that the average 
initial degree of a vertex is ki n ,it = m r (l + m s ), and that the contribution of 
the seed to the network size can be ignored. Separating and integrating (from 
ti to t, and from ki n u to ki), we get the following time evolution for the vertex 
degrees: 

/ t \ V A 

k M = B (j-) -C ( 2 ) 

where A = 2 (1 + m s ) /m s , B = m r (A + 1 + m s ), and C = Am r . 

From the time evolution of vertex degree ki(t) we can calculate the degree dis- 
tribution p(k) by forming the cumulative distribution F(k) and differentiating 
with respect to k. Since in the mean field approximation the degree ki(t) of a 
vertex Vi increases strictly monotonously from the time ti the vertex is initially 
added to the network, the fraction of vertices whose degree is less than ki(t) 
at time t is equivalent to the fraction of vertices that were introduced after 
time ti. Since t is evenly distributed, this fraction is {t — ti)/t. These facts lead 
to the cumulative distribution 

F(ki) = P(~k <ki) = P(t>ti) = -(t -ti). (3) 

Solving for ti = ti(ki,t) = B A (ki + C)~ A t from (2) and inserting it into (3), 
differentiating F{ki) with respect to k i7 and replacing the notation ki by k 
in the resulting equation, we get the probability density distribution for the 
degree k as: 

p(k) = AB A (k + C)~ 2/ms - 3 , (4) 

where A, B and C are as above. Hence, in the limit of large k, the distribution 
becomes a power law p(k) oc &~ 7 , with 7 = 3 + ^-, m s > 0, leading to 
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3 < 7 < °° • In the model, 7 = 3 can never be reached due to the random 
component of attachment. When the importance of the random connection is 
diminished with respect to the implicit preferential component by increasing 
m s , however, the theoretical degree exponent approaches the limit 3, the value 
resulting from pure preferential attachment. 



2.4 Clustering spectrum 



The dependence of the clustering coefficient on vertex degree can also be found 
by the rate equation method (^J|. Let us examine how the number of triangles 
Ei around a vertex Vi changes with time. The triangles around Vi are mainly 
generated by two processes: 1) Vertex Vi is chosen as one of the initial contacts 
with probability m r /t, and the new vertex links to some of its neighbours (we 
assume m s on average, although sometimes this is limited by the number of 
neighbours the initial contact has, i.e. saturation) 2) The vertex Vi is selected 
as a secondary contact, and a triangle is formed between the new vertex, 
the initial contact and the secondary contact. Note that triangles can also be 
generated by selecting two neighbouring vertices as the initial contacts, but 
in the first approximation the contribution of this is negligible. These two 
processes are described by the rate equation 

dEi(ki,t) m r m s ki dki m r (m s -l) 

£ = h m r m<, = — — , (5) 

dt t Ek dt t ' K } 

where the second right hand side is obtained by applying Eq. (1). Integrating 
both sides with respect to t, and using the initial condition Ei{k in i t , U) = m r (l + m s ). 
we get the time evolution of triangles around a vertex Vi as 

Ei(t) = ki(t) + m r {m s — 1) In ^— ^ — m r . (6) 



We can now make use the previously found dependence of ki on £j for finding 
Cj(fc). Solving for In (^\ in terms of ki from (2), inserting it into (6) to get 
Ei(ki), and dividing Ei(ki) by the maximum possible number of triangles, 
ki(ki — l)/2, we arrive at the clustering coefficient: 



ki{ki 1) ki{ki 1) 

where C = Am r , D = C(m s — 1), and F = D \nB + m r . For large values of 
degree k, the clustering coefficient thus depends on k as c{k) ~ 
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2. 5 Comparison of theory and simulations 

Fig. 3 displays the degree distributions averaged over 100 runs for networks 
of size N = 10 6 for various parametrizations, together with analytical curves 
calculated using Eq. (4). The analytical distributions asymptotically approach 
power laws with exponents p(k) oc A; -7 (from top to bottom) 7 = 5, 4.33, 5, and 
7. The tails of the simulated distributions fall below the theoretical predictions 
due to degree correlations, as explained earlier. The degree-degree correlations 
were confirmed as the cause of the deviation by replacing the attachment to 
secondary contacts by pure random preferential attachment, after which the 
simulated and theoretical slopes matched very closely (not shown). Note that 
the parameter values shown here were chosen for simplicity, and they could 
be tuned for different qualities. 



Fig. 3. Degree distributions of simulated networks of size N = 10 6 , averaged over 
100 runs each. Due to degree-degree correlations in the network, linking to the 
neighbourhood of a vertex does not strictly lead to preferential attachment, which 
causes the distributions to fall below the theoretical power laws (solid lines) at 
large k. Curves are vertically translated a decade apart for clarity. Inset: the ratio 
of simulated values to theoretical ones. Markers correspond to different parameter 
values: (+): number of initial contacts ni n n from the discrete uniform distribution 
U[l, 3], number of secondary contacts n2 n d from J7[0, 2]. (o): p{ni n n = 1) = 0.95, 



p{n init = 2) = 0.05, n 2nd ~ *7[0,3]. (x): p(n init = 1) = 0.95, p(n init = 2) = 0.05, 
n 2nd ~ U[0,2]. (□): p{n init = l) = 0.95, p(n init = 2) = 0.05, n 2nd ~ U[0, 1]. 
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The top panel of Fig. 4 displays averaged values of the clustering coefficient 
c(k) for the same networks, together with analytical curves calculated using 
Eq. (7). We see that the predictions match the simulated results well, and 
the c(k) ~ l//c-trend is clearly visible. The corresponding network-averaged 
clustering coefficients are (top to bottom) (c) = 0.30, 0.58, 0.54 and 0.43, 
i.e. the degree of clustering is relatively high. Of these parameter sets, (o) 
allows the largest number of links from each initial contact, therefore giving 
the largest average clustering. Higher clustering coefficients could be obtained 
by increasing the possible number of secondary contacts. 




Fig. 4. Top: Clustering coefficient c(k), averaged over 100 iterations for networks of 
size N = 10 6 . Predictions for c(k) (solid lines) agree well with simulated results. 
Curves are vertically translated a decade apart for clarity. Inset: the ratio of simula- 
tion results to theory. Bottom left: Average nearest-neighbour degree k nn {k) for the 
same networks, displaying a signature of assortative mixing. Bottom right: average 
shortest path lengths grow logarithmically with network size. (+): number of initial 
contacts from [7[1,3], secondary contacts from U[0, 2]. Markers correspond to the 
same parameters as in Fig. 3. 
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2.6 Degree-degree correlations and average shortest path lengths 

Next, we investigate the degree-degree correlations of our model networks. 
Social networks are often associated with assortative mixing ^2(fy related to 
vertex degrees, i.e. high-degree vertices tend to connect to other high-degree 
vertices. This tendency can be formulated in terms of a conditional probability 
P(k'\k) that an ed ge c onnected to a vertex of degree k has a vertex of degree 
k! at its other end |22j. A quantity more suitable for numerical investigations 
is the average nearest-neighbour degree k nn {k) = J2k' k'P(k'\k). If k nn {k) is 
an increasing function of k, the network is assortatively mixed in terms of 
vertex degrees. The bottom left panel in Fig. 4 shows k nn (k) averaged over 100 
networks, displaying a clear signature of assortative mixing. Another measure 
of degree-degree correlations is the assortativity coefficient r j^, which is the 
Pearson correlation coefficient of vertex degrees at either end of an edge. For 
the model networks generated with the parameters used in this paper, the 
coefficients are (+): 0.18, (o): 0.10, (x): 0.10, and (□): 0.09. For different co- 
authorship networks, for example, the assortativity coefficient has been found 
to range from 0.12 to 0.36 [2oT |. 

Qualitatively, the presence of positive degree-degree correlations can be at- 
tributed to the neighbourhood connections, as well as the high degree of clus- 
tering. Consider a situation where a new vertex attaches to one initial contact 
Vi and m s of its neighbours, so that the degree of all the vertices in question is 
increased by one. Hence, positive correlations are induced between the degrees 
of Vi and its m s neighbours. In addition, because of the high clustering, there 
is a large probability of connections between the m s neighbours. This gives 
rise to positive degree correlations between the m s vertices. 

It is commonly observed in real life networks that average path lenghts are 
short with respect to network size Together with high clustering, this is 
called the small world effect. Typically in model networks, the shortest path 
lengths are found to grow logarithmically with network size. This is also the 
case in our model (Fig. 4, bottom right panel). 



2. 7 Community structure 

The emergence of communities in the networks generated by our model can 
be attributed to the effects of the two types of attachment. Roughly speaking, 
attachment to the secondary contacts tends to enlarge existing communities; 
the new vertex creates triangles with the initial contact and its nearest neigh- 
bours. If the internal connections within an existing community are dense, the 
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secondary contacts tend to be members of the same community, and thus this 
community grows. On the other hand, new vertices joining the network may 
attach to several initial contacts (with our parametrizations, two or three). 
If they belong to different communities, the new vertex assumes the role of 
a "bridge" between these. However, no edges are added between the vertices 
already in the network. Therefore, the maximum size of a clique, i.e. a fully 
connected subgraph, to be found in the network is limited by the maximum 
number of edges added per time step. In this model the number of added edges 
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Fig. 5. The average number of fc-clique-communities ( •: k = 3, o: k = 4, *: k = 5) of 
each size found in our model network with A = 50 000, number of initial connections 
p(n<init = 1) = 0.95, p{ni n it = 2) = 0.05, and number of secondary connections 
from U[0, 3], averaged over 20 networks. In the case of 3-cliques, large communities 
spanning roughly half the network are seen. The community size distributions are 
broad, and their log-log plots appear power-law-like, although the cumulative distri- 
butions (not shown) show some deviation. Approximate slopes of the log-log plots 
are k = 3: 3 (excluding the supercommunities), k = A: 4, and k = 5: 10. A very large 
3-clique-community spans roughly half of the vertices in any network generated with 
these parameters. In the corresponding randomized networks, where edges were shuf- 
fled keeping the degree distribution intact, there were only a few adjacent triangles, 
and no 4-cliques at all (□: 3-clique-communities found in the randomized networks). 
The inset shows the effect of network size N on the 3-clique-community size distri- 
bution for N = 100, 500, 1000, 5000, 10000, 50000. As all data fit on the same line 
when scaled by 1/N, the network size does not affect the slope. Note that different 
choices of parameters would allow larger cliques and larger /c-clique-communities to 
form. 
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varies, allowing for fairly large cliques to form while average vertex degree is 
kept small. Visualizations of our model networks with proper parametrization 
exhibit clear evidence of community structure, as shown in Fig. 2. 



In order to quantify the community structure, we have utilized the k-clique 
method of Palla et al. [l^, HiJ and the free software package CFinder they 
provide. In this approach, the definition of communities is based on the obser- 
vation that a typical community consists of several fully connected subgraphs 
(cliques) that tend to share many of their vertices. Thus, a k- clique- community 
is defined as a union of all A;-cliques that can be reached from each other 
through a series of adjacent fc-cliques (where adjacency means sharing k — 1 
vertices) . This definition determines the communities uniquely, and one of its 
strengths is that it allows the communities to overlap, i.e. a single vertex can 
be a member of several communities. For social networks, this is especially 
justified. 



We have found that the size distributions of /c-clique-communities in our model 
networks are broad, and appear power-law-like (Fig. 5). The slopes of the 
log-log plots were seen not to depend on the network size N. In the case 
of 3-cliques, a very large community spans roughly half of the vertices in 
any network generated with these parameters. Similar large 3-cliques can be 
observed in many other networks with communities as well, e.g. in the datasets 
provided with the CFinder package: a snapshot of the co-authorship network 
of the Los Alamos e-print archives, where 54% of the roughly 30 000 vertices 
belong to the largest 3-clique-community; in the word association network of 
the South Florida Free Association norms (67%), and in the protein-protein 
interaction network of the Saccharomyces cerevisiae (17%). The requirements 
for a 3-clique-community are not very strict, and it is not surprising that one 
community can span most of the network. With these choices of parameters, 
no such supercommunities arise with k > 3. 



Comparison of the resulting community size distributions with randomized 
networks, where the edges of the networks were scrambled keeping the degree 
distributions intact, makes it evident that community structure is present in 
the model networks (Fig. 5). Community sizes depend on i) how the com- 
munities are defined and detected, as different methods divide the networks 
into differently sized communities, and ii) what type of social networks are 
investigated, as different types of networks can be expected to display dif- 
ferent community structures. Although analysis of the community structure 
of empirical social networks is a relevant question, we will leave it for future 
work. We attempt to provide a generic model that can be tuned for desired 
qualities. 
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3 Summary 



In this paper we have developed a model which produces very efficiently net- 
works resembling real social networks in that they have assortative degree 
correlations, high clustering, short average path lengths, broad degree distri- 
butions and prominent community structure. The model is based on network 
growth by two processes: attachment to random vertices and attachment to 
their neighbourhood. Theoretical approximations for the degree distribution 
and clustering spectrum have been derived and compared with simulation re- 
sults. The observed deviations can be attributed to degree correlations. Visual- 
izations of the networks and quantitative analysis show significant community 
structure. In terms of communities defined using the /c-clique method, the an- 
alyzed community size distributions display power-law-like tails. These types 
of features are also present in many real-life networks, making the model well 
suited for simulating dynamic phenomena on social networks. 
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